Automated Entropy Value Frequency (AEVF) Algorithm for Outlier Detection in Categorical Data
نویسنده
چکیده
Outlier detection has been a very important concept in data mining. The aim of outlier detection is to find those objects that are of not the norm. There are many applications of outlier detection from network security to detecting credit fraud. However most of the outlier detection algorithms are focused towards numerical data and do not perform well when applied to categorical data. In this paper, we propose an automated outlier detection algorithm which specifically caters for categorical data. Key-Words: Outlier Detection, Entropy, Categorical Data, Numerical Data
منابع مشابه
Outlier Analysis of Categorical Data using NAVF
Introduction Outlier analysis is an important research field in many applications like credit card fraud, intrusion detection in networks, medical field .This analysis concentrate on detecting infrequent data records in dataset. Most of the existing systems are concentrated on numerical attributes or ordinal attributes .Sometimes categorical attribute values can be converted into numerical valu...
متن کاملEMPWC: Expectation Maximization with Particle Swarm Optimization based Weig- hted Clustering for Outlier Detection in Large Scale Data
Outlier detection is usually considered as a pre-processing step for locating in a data set, those objects that do not conform to well-defi ned notions of expected behaviour. It is very important in data mining for discovering novel or rare events, anomalies, vicious actions, exceptional phenomena etc. However, investigation of outlier detection for categorical data sets is especially a challen...
متن کاملInitialization of K-modes clustering using outlier detection techniques
The K-modes clustering has received much attention, since it works well for categorical data sets. However, the performance of K-modes clustering is especially sensitive to the selection of initial cluster centers. Therefore, choosing the proper initial cluster centers is a key step for K-modes clustering. In this paper, we consider the initialization of K-modes clustering from the view of outl...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملOutlier Detection in Complex Categorical Data by Modeling the Feature Value Couplings
This paper introduces a novel unsupervised outlier detection method, namely Coupled Biased Random Walks (CBRW), for identifying outliers in categorical data with diversified frequency distributions and many noisy features. Existing pattern-based outlier detection methods are ineffective in handling such complex scenarios, as they misfit such data. CBRW estimates outlier scores of feature values...
متن کامل